Word Model Adaptation in Voice-Activated Teleservices

نویسندگان

  • Robert van Kommer
  • Beat Hirsbrunner
چکیده

In voice-activated teleservices, two types of speech recognition systems are commonly used, (tri)phone-based and wholeword model recognizers. While the first type of systems exhibits a convenient implementation to recognize any new vocabulary word, the second achieves a higher performance level when the necessary and specific training data is available. In order to bridge the performance gap between these two systems, this paper describes a new adaptation method based on a two-tier speech model. More specifically, the first tier of the model architecture performs the phone-based recognition, while the second tier implements the adaptation to the whole-word models. Experimental results for connected word recognition are presented in two different cases, (i) for a hybrid NN/HMM recognition system, and (ii) for the new two-tier hybrid system that is implemented through a Multirate Neural Network (MNN) front-end. According to the results obtained within the described experimental settings, the adaptation method reduces the error rate by more than 80%. Furthermore, the new system is an example of modular spatiotemporal modeling of speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity Based Language Model Construction for Voice Activated Open-Domain Question Answering

This paper describes a novel method of constructing a language model for speech recognition of inputs with a particular style, using a large-scale Web archive. Our target is an open domain voice-activated QA system and our speech recognition module must recognize relatively short, domain independent questions. The central issue is how to prepare a large scale training corpus with low cost, and ...

متن کامل

Speechdat multilingual speech databases for teleservices: across the finish line

The goal of the SpeechDat project is to develop spoken language resources for speech recognisers suited to realise voice driven teleservices. SpeechDat created speech databases for all official languages of the European Union and some major dialectal varieties and minority languages. The size of the databases ranges between 500 and 5000 speakers. In total 20 databases are recorded over the fixe...

متن کامل

VAD-measure-embedded decoder with online model adaptation

We previously proposed a decoding method for automatic speech recognition utilizing hypothesis scores weighted by voice activity detection (VAD)-measures. This method uses two Gaussian mixture models (GMMs) to obtain confidence measures: one for speech, the other for non-speech. To achieve good search performance, we need to adapt the GMMs properly for input utterances and environmental noise. ...

متن کامل

SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices

In this article experiences in creating large multilingual speech databases for teleservices within a large consortium are reported in order to inspire, to facilitate or to compare the set-up and progress of other enterprises for collecting large speech databases. The focus will be on following aspects: Objectives, benefits, and strategy; project organization; database contents and creation; va...

متن کامل

MAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech

The paper deals with the problem of efficient adaptation of speech recognition systems to individual users. The goal is to achieve better performance in specific applications where one known speaker is expected. In our approach we adopt the MAP (Maximum A Posteriori) method for this purpose. The MAP based formulae for the adaptation of the HMM (Hidden Markov Model) parameters are described. Sev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001